- Title
- Efficient incident identification from multi-dimensional issue reports via meta-heuristic search
- Creator
- Gu, Jiazhen; Luo, Chuan; Qin, Si; Qiao, Bo; Lin, Qingwei; Zhang, Hongyu; Li, Ze; Dang, Yingnong; Cai, Shaowei; Wu, Wei; Zhou, Yangfan; Chintalapati, Murali; Zhang, Dongmei
- Relation
- 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '20). Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '20) ( 08-13 November, 2020) p. 292-303
- Relation
- ARC.DP200102940 http://purl.org/au-research/grants/arc/DP200102940
- Publisher Link
- http://dx.doi.org/10.1145/3368089.3409741
- Publisher
- Association for Computing Machinery
- Resource Type
- conference paper
- Date
- 2020
- Description
- In large-scale cloud systems, unplanned service interruptions and outages may cause severe degradation of service availability. Such incidents can occur in a bursty manner, which will deteriorate user satisfaction. Identifying incidents rapidly and accurately is critical to the operation and maintenance of a cloud system. In industrial practice, incidents are typically detected through analyzing the issue reports, which are generated over time by monitoring cloud services. Identifying incidents in a large number of issue reports is quite challenging. An issue report is typically multi-dimensional: it has many categorical attributes. It is difficult to identify a specific attribute combination that indicates an incident. Existing methods generally rely on pruning-based search, which is time-consuming given high-dimensional data, thus not practical to incident detection in large-scale cloud systems. In this paper, we propose MID (Multi-dimensional Incident Detection), a novel framework for identifying incidents from large-amount, multi-dimensional issue reports effectively and efficiently. Key to the MID design is encoding the problem into a combinatorial optimization problem. Then a specific-tailored meta-heuristic search method is designed, which can rapidly identify attribute combinations that indicate incidents. We evaluate MID with extensive experiments using both synthetic data and real-world data collected from a large-scale production cloud system. The experimental results show that MID significantly outperforms the current state-of-the-art methods in terms of effectiveness and efficiency. Additionally, MID has been successfully applied to Microsoft's cloud systems and helped greatly reduce manual maintenance effort.
- Subject
- incident detection; emerging issue; meta-heuristic search; effective combination; problem identification
- Identifier
- http://hdl.handle.net/1959.13/1443911
- Identifier
- uon:42143
- Identifier
- ISBN:9781450370431
- Language
- eng
- Reviewed
- Hits: 1880
- Visitors: 1873
- Downloads: 1
Thumbnail | File | Description | Size | Format |
---|